Design and adjustment of dependency measures
نویسنده
چکیده
Dependency measures are fundamental for a number of important applications in data mining and machine learning. They are ubiquitously used: for feature selection, for clustering comparisons and validation, as splitting criteria in random forest, and to infer biological networks, to list a few. More generally, there are three important applications of dependency measures: detection, quantification, and ranking of dependencies. Dependency measures are estimated on finite data sets and because of this the tasks above become challenging. This thesis proposes a series of contributions to improve performances on each of these three goals. When differentiating between strong and weak relationships using information theoretic measures, the variance plays an important role: the higher the variance, the lower the chance to correctly rank the relationships. In this thesis, we discuss the design of a dependency measure based on the normalized mutual information whose estimation is based on many random discretization grids. This approach allows us to reduce its estimation variance. We show that a small estimation variance for the grid estimator of mutual information if beneficial to achieve higher power when the task is detection of dependencies between variables and when ranking different noisy dependencies. Dependency measure estimates can be high because of chance when the sample size is small, e.g. because of missing values, or when the dependency is estimated between categorical variables with many categories. These biases cause problems when the dependency must have an interpretable quantification and when ranking dependencies for feature selection. In this thesis, we formalize a framework to adjust dependency measures in order to correct for these biases. We apply our adjustments to existing dependency measures between variables and show how to achieve better interpretability in quantification. For example, when a dependency measure is used to quantify the amount of noise on functional dependencies between variables, we experimentally demonstrate that adjusted measures have more interpretable range of variation. Moreover, we demonstrate
منابع مشابه
Investigation of the Effect of Reality Therapy on Promoting Responsibility and Social Adjustment in Adolescent Females
Background and Objective: Reality therapy referred to taking responsibility and having effective relationships with others and social adjustment in life. This study aimed to investigate the effect of reality therapy on promoting responsibility and social adjustment in adolescent females. Materials and Methods: This quasi-experimental study was conducted using a pretest-posttest design with a c...
متن کاملEvaluation of the effectiveness of sexual intelligence-based education on sexual function and marital adjustment of male couples
The present study was conducted with the aim of investigating the effectiveness of sexual intelligence training on erectile function and marital adjustment of male couples. The research method was semi-experimental with a pre-test-post-test design with a control group. The statistical population included all couples (men) with symptoms of sexual dysfunction who had been referred to counseling c...
متن کاملStress-Strength and Ageing Intensity Analysis via a New Bivariate Negative Gompertz-Makeham Model
In Demography and modelling mortality (or failure) data the univariate Makeham-Gompertz is well-known for its extension of exponential distribution. Here, a bivariate class of Gompertz--Makeham distribution is constructed based on random number of extremal events. Some reliability properties such as ageing intensity, stress-strength based on competing risks are given. Also dependence properties...
متن کاملThe relationship between substance dependency, antisocial personality disorder and adult antisocial behaviors
Purpose: To examine the relationship between substance dependency and personality disorders. Materials and Methods: Ninety eight patients with substance dependency who had referred to a psychiatry and substance rehabilitation center in Tehran took part in this study. Data were collected using questionnaire and a semi-structured interview. Participants were examined for conduct and ant...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کامل